wikicompfandomcom-20200214-history
Statistics
Statistics 'is a sub-field of mathematics that deals with the collection and interpretation of data. In statistics, a testable hypothesis is key for interpreting the results. The null hypothesis is the hypothesis that is tested against. Quantitative data is collected and analyzed against the null hypothesis. If the test generates a ''p-value less than or equal to 0.05, then the null hypothesis is rejected. If the p-value is greater than 0.05, then the null hypothesis cannot be rejected and is considered true. History '''Statistics has been utilized by humans since antiquity. The earliest recorded use of the mathematics that would become the foundation of statistics was from Mesopotamia. http://ancienthistory.about.com/od/abacus/a/BabylonianMath.htm The word "statistics" comes from the German word statistik, meaning 'political state', and the Latin word statisticum, meaning 'collegium'. http://www.etymonline.com/index.php?search=Statistics&searchmode=none These two definitions refer to the common use of statistical methods at to collect and classify data, usually of populations from censuses. Statistics grew into a recognized field of mathematics in the mid-17th century. The actual "founder" of statistics is a contested title between multiple men. Willcox, W.F. (1938). The founder of statistics. Revue de I'Instiut International de Statistique/Review of the International Statistical Institute, 5(4), 321-328. These men are Hermann Conring, Gottfried Achenwall, John Graunt, and William Petty. Conring and his successor, Achenwall, were the first to lecture on the subject at the university level. While Conring was the first to teach statistics, he taught only in the Latin language; it was Achenwall who translated his ideas to German and made statistics available to the general public. Graunt and Petty wrote one of the earliest texts on statistics. Terminology Commonly Used Symbols \bar{X} : mean n : sample size \hat{\sigma} : standard deviation s'': standard deviation μ0: the mean value being tested H0: null hypothesis Ha: alternative hypothesis Key Terms Alternative hypothesis: A statement where something is happening. It can be where the assumed status quo is false, there is no relationship, or there is a difference between variables. Is usually what the statistician wants to prove. Categorical variable: A variable where an individual fits only in one category. For example, when categorizing eye color, there are a finite number groups of colors found in the population (blue, green, brown, hazel, other). Tests examining categorical variables examine how many individuals and what percent fall into each category. Explanatory variable: Also known as the independent variable. The variable that can explain or can cause differences in the response variable. Null hypothesis: A statement where there is nothing happening. There are no relationships or no differences occurring. This hypothesis can be thought of as the 'status quo'. Usually, the statistician wants to disprove or reject the null hypothesis. Ordinal variable: Data where the categorical variable has ordered categories. One common example of ordinal variables are the scoring values seen in personal evaluations, where 1= poor and 5= excellent. ''p-value: The probability of determining a test statistic equal to or greater than the observed test statistic, assuming that the null hypothesis is true. In most statistical tests, a p''-value less than or equal to 0.05 warrants rejection of the null hypothesis and acceptance of the alternative hypothesis as correct. Quantitative variable: Data where the values are numeric measurements or counts. Response variable: Also known as the dependent variable or the outcome variable. The variable that is affected by the explanatory variable. Utts, J.M., & Heckard, R.F. (2012). "Mind on statistics." Boston: Cengage Learning. Type I Error: Occurs where the null hypothesis is actually true and is falsely rejected, leading the statistician to believe the alternative hypothesis is wrongly true. Type II Error: Occurs where the alternative hypothesis is actually true and the statistician falsely concludes that the null hypothesis is true. Testing Data Hypothesis Testing Also known as significance testing, this form of statistics uses data from a sample to determine whether a statement (ie. hypothesis) about a population may or may not be true. A null hypothesis is generated and tested to determine if it can be rejected or not. Common Tests Mean, Median, and Mode Mean The average of a quantitative data set. : \bar{x} = \frac{x_n}{n} Median The middle data value of a data set. If the data set has an even number of integers, then the median value is the average of the two middle values. Mode The most common numerical value in a data set. T-Test This statistical test is used to study both single populations and to compare two different populations against each other. One-Tailed T-Test The one-tailed T-test is used to test the null hypothesis that the population mean is equal to a specific mean. H0:μ0 = μ versus H0:μ0 < μ H0:μ0 = μ versus H0:μ0 > μ For Ha: μ < μ0, the ''p-value is the area below t'' For Ha: μ > μ0, the ''p-value is the area above t'' : t = \frac{\overline{x} - \mu_0}{s/\sqrt{n}} df = ''n-1 Two-Tailed T-Test This test is used when comparing two samples of equal size and variance. H0:μ0 = μ versus H0:μ0 ≠ μ For Ha: μ0 ≠ μ, the p''-value is 2x the area above |t| : t = \frac{\bar {X}_1 - \bar{X}_2}{s_{X_1X_2} \cdot \sqrt{\frac{2}{n}}} where : \ s_{X_1X_2} = \sqrt{\frac{1}{2}(s_{X_1}^2+s_{X_2}^2)} Paired T-Test This test is used when comparing paired data. H0: μd = 0 versus H0: μd > 0 or H0: μd < 0 or H0: μd ≠ 0 : t = \frac{\overline{X}_D - \mu_0}{s_D/\sqrt{n}}. Z-test for a Proportion This method of analyzing data can be used when there is a sufficiently large random sample and the null value is the true population value of "p". The test statistic has approximately a normal distribution. ''Z = (X'' − μ0) / ''s The p''-value probability is found by using a standard normal curve and determining the area under the curve for the corresponding z test statistic. When the difference between the sample statistic and the null value is large, the z value will also be large. If this occurs in the direction of the alternative hypothesis, the null hypothesis is rejected and the alternative hypothesis is confirmed. Conditions for using the z-test for a proportion are: 1. The sample should be a random sample from the population or the data should come from a binomial experiment with independent variables 2. The quantities ''np0 and n(1-p0) should both be greater than or equal to 10 Chi-Square Statistic The chi-square statistic is used to examine the association between two categorical variables. The larger the chi-square value, the stronger the association between the two variables. \chi^2 = : \sum : \frac{(\operatorname{observed} - \operatorname{expected})^2} {\operatorname{expected}} References